Skip to content

Draft: GPT2 training on MCQ med data#1111

Open
mina5rovic wants to merge 40 commits into
developfrom
gpt2-training
Open

Draft: GPT2 training on MCQ med data#1111
mina5rovic wants to merge 40 commits into
developfrom
gpt2-training

Conversation

@mina5rovic

Copy link
Copy Markdown
Collaborator

No description provided.

@mina5rovic mina5rovic requested a review from JulienVig April 16, 2026 11:51
@mina5rovic mina5rovic changed the title GPT2 training on MCQ med data Draft: GPT2 training on MCQ med data Apr 16, 2026

@JulienVig JulienVig left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of the benchmark and the memorization looks good to me! Just had some performance comments

Comment thread cli/src/evaluate_finetuned_gpt2.ts Outdated
Comment thread cli/src/evaluate_finetuned_gpt2.ts Outdated

const logits = tfModel.predict(inputTensor) as tf.Tensor;
const logProbs = tf.logSoftmax(logits, -1);
const arr = await logProbs.array() as number[][][];

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're computing the softmax for every position but you only need the last one and line 74 materializes the whole array while you only need 4. You could rewrite this logic such that you only work with the last position, for example:

const optionLogProbs = tf.tidy(() => {
        const logits = tfModel.predict(inputTensor) as tf.Tensor3D; // [1, seqLen, vocab]
        const lastLogits = logits
            .slice([0, promptTokens.length - 1, 0], [1, 1, -1])      // final position only
            .reshape([-1]);                                          // [vocab]
        const logProbs = tf.logSoftmax(lastLogits);                  // [vocab]
        return tf.gather(logProbs, continuationTokenIDs);                  // continuationTokenIDs is an array of the 4 continuations' tokenID
    });
const scores = await optionLogProbs.array(); // just 4 values

Comment thread cli/src/evaluate_finetuned_gpt2.ts Outdated
Comment thread cli/src/evaluate_finetuned_gpt2.ts Outdated
Comment thread cli/src/measure_memorization_gpt2.ts
return output as tf.Tensor;
});

console.log("logits shape:", logits.shape);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to remove the debug prints before merging the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants